Generally, RAG is a
LLM(Large Language Model)
which can fetch data from external source(eg:
vector database, SQL Db, Graph DB, Web Search engines) & feed to AI generation
process.
Purpose of RAG? To give more Context to LLM models to predict better
What is vector?
What is Embedding Model?
|
1. The Retrieval Phase: Chunking: Raw documents are broken down into smaller, readable pieces. Embedding: These text chunks are converted into mathematical representations (vectors) using an embedding model. Vector Search: User asks a question, system searches vector 2. The Augmentation Phase: Once the relevant information is retrieved, it isn't just displayed. It is packaged. The system takes the user’s original query and the retrieved text chunks, combining them into a specifically crafted db 3. The Generation Phase This combined prompt (the user's query + the retrieved context) is fed into the LLM. This forces the model to synthesize an answer based only on the provided external data, which drastically reduces hallucinations |
@startuml
actor admin as admin
box Retriever Phase #LightCyan
participant em as "from llama_index.embeddings.openai \nimport OpenAIEmbedding\n\nEmbedding Model"
participant vdb as "from llama_index.core \nimport VectorStoreIndex\n\nVector DB"
end box
box Augumentation Phase #LightYellow
participant rs as "class ResponseSynthesizer\n inside query_engine"
end box
box Generation Phase #Pink
participant llm as "llama_index.llms.openai \nimport OpenAI\n\nLLM"
end box
actor User as u
admin -> em: Feed Raw documents(data)
note over em
Create Tensors/Vectors
end note
em -> vdb: vectors
u -> rs: user_query
vdb -> rs: vectors/nodes
activate rs
rs --> rs: Create augumented_prompt \n augumented_prompt=\n (Context+user_query+vectors)
deactivate rs
rs -> llm: augumented_query
llm -> u: Response of query
@enduml
We have log files(eg: VPN, firewall).
RAG pipeline will read log files and provide answers to Administrator questions.
./logs/vpn.log
2025-05-01 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4
2025-05-01 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4
./logs/firewall.log
2025-05-01 FIREWALL_DENY src=10.1.1.5 dst=8.8.8.8 policy=OUTBOUND_BLOCK
2025-05-01 FIREWALL_DENY src=10.1.1.6 dst=1.1.1.1 policy=OUTBOUND_BLOCK
$ cat rag_pipeline.py
import os
import dotenv
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI #Import LLM
from llama_index.core import Settings
# Load GitHub Token and set env
dotenv.load_dotenv()
if not os.getenv("GITHUB_TOKEN"):
raise ValueError("GITHUB_TOKEN is not set")
os.environ["OPENAI_API_KEY"] = os.getenv("GITHUB_TOKEN")
os.environ["OPENAI_BASE_URL"] = "https://models.inference.ai.azure.com/"
############## 1. Retrieval Phase Start #################
## A. Setup Embedding Model. This is Neural Network
embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY"),
api_base=os.getenv("OPENAI_BASE_URL"),
)
Settings.embed_model = embed_model
## B. Break documents into Chunks
documents = SimpleDirectoryReader("./logs").load_data()
## C. Pass Chunked documents to Embedding model
# And store Chunks into local vectorDB
# def from_documents(documents, insert_batch_size=150):
# embed_model = Settings.embed_model #embed_model from Global
# nodes = self._chunk_documents(documents) #chunks the documents into Nodes
# for batch in batches(nodes, batch_size=insert_batch_size):
# texts = [node.text for node in batch]
# embeddings = embed_model.get_text_embedding_batch(texts)
# self._vector_store.add(embeddings, metadata=batch.metadata) #Store Tensors into the vector DB
index = VectorStoreIndex.from_documents(documents, insert_batch_size=150)
############## Retrieval Phase End #####################
# Create LLM
llm = OpenAI(
model="gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY"),
api_base=os.getenv("OPENAI_BASE_URL"),
)
############## 2,3. Augumentation & Generation Phase Start #################
# def query_engine(query_string: str):
////// Augumentation Phase. Create augmented_prompt //////
# query_tensor = Settings.embed_model.get_text_embedding(user_query_string)
# top_k_nodes = self._vector_store.similarity_search(
# query_tensor,
# similarity_top_k=3
# )
# top_k_nodes now contains the 3 most relevant text chunks (Nodes)
# e.g., Node 1: "2025-05-01 08:22:47 VPN_LOGIN_FAILED user=eve.hacker ip=203.0.113.7"
# Node 2: "2025-05-01 08:30:55 VPN_LOGIN_SUCCESS user=carol.white ip=192.168.1.52"
# Node 3: "2025-05-01 09:01:08 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4"
# vectordb_text = [Node1][Node2][Node3]
# augmented_prompt=
# "Context:
# [Node1][Node2][Node3]
# Question: failed vpn logins for 2 hours after 2025-05-01 09:01:08
# Answer:"
#
#
query_engine = index.as_query_engine(
llm=llm
)
response = query_engine.query("Show firewall policies blocking outbound traffic")
print(response)
Response=
The firewall policies blocking outbound traffic are as follows:
1. Policy: OUTBOUND_BLOCK
- Source: 10.1.1.5
- Destination: 8.8.8.8
2. Policy: OUTBOUND_BLOCK
- Source: 10.1.1.6
- Destination: 1.1.1.1
response = query_engine.query("Why is john.doe unable to connect to VPN?")
print(response)
Response=
john.doe is unable to connect to the VPN due to repeated login failures, as
indicated by the log entries showing two instances of VPN_LOGIN_FAILED for the user.
############## Augumentation & Generation Phase End #################